Method and system for inline deduplication using erasure coding

ABSTRACT

A method for storing data includes obtaining data, applying an erasure coding procedure to the data to obtain a plurality of data chunks and a parity chunk, deduplicating the plurality of data chunks to obtain a plurality of deduplicated data chunks, storing, across a plurality of nodes, the plurality of deduplicated data chunks and the parity chunk, and tracking location information for each of the plurality of deduplicated data chunks and the parity chunk.

BACKGROUND

Computing devices may include any number of internal components such asprocessors, memory, and persistent storage. Each of the internalcomponents of a computing device may be used to generate data. Theprocess of generating, storing, and backing-up data may utilizecomputing resources of the computing devices such as processing andstorage. The utilization of the aforementioned computing resources togenerate backups may impact the overall performance of the computingresources.

SUMMARY

In general, in one aspect, the invention relates to a method for storingdata in accordance with one or more embodiments of the invention. Themethod includes obtaining data, applying an erasure coding procedure tothe data to obtain a plurality of data chunks and a parity chunk,deduplicating the plurality of data chunks to obtain a plurality ofdeduplicated data chunks, storing, across a plurality of nodes, theplurality of deduplicated data chunks and the parity chunk, and trackinglocation information for each of the plurality of deduplicated datachunks and the parity chunk.

In general, in one aspect, the invention relates to a non-transitorycomputer readable medium in accordance with one or more embodiments ofthe invention includes computer readable program code, which whenexecuted by a computer processor enables the computer processor toperform a method for storing data. The method includes obtaining data,applying an erasure coding procedure to the data to obtain a pluralityof data chunks and a parity chunk, deduplicating the plurality of datachunks to obtain a plurality of deduplicated data chunks, storing,across a plurality of nodes, the plurality of deduplicated data chunksand the parity chunk, and tracking location information for each of theplurality of deduplicated data chunks and the parity chunk.

In general, in one aspect, the invention relates to a data cluster. Thedata cluster includes a plurality of data nodes comprising anaccelerator pool and a non-accelerator pool, wherein the acceleratorpool comprises a data node, and the non-accelerator pool comprises aplurality of data nodes; wherein the data node of the plurality node isprogrammed to: obtain data, apply an erasure coding procedure to thedata to obtain a plurality of data chunks and a parity chunk,deduplicate the plurality of data chunks to obtain a plurality ofdeduplicated data chunks, store, across a plurality of nodes, theplurality of deduplicated data chunks and the parity chunk, and tracklocation information for each of the plurality of deduplicated datachunks and the parity chunk.

BRIEF DESCRIPTION OF DRAWINGS

Certain embodiments of the invention will be described with reference tothe accompanying drawings. However, the accompanying drawings illustrateonly certain aspects or implementations of the invention by way ofexample and are not meant to limit the scope of the claims.

FIG. 1A shows a diagram of a system in accordance with one or moreembodiments of the invention.

FIG. 1B shows a diagram of a data cluster in accordance with one or moreembodiments of the invention.

FIG. 2 shows a flowchart for storing data in a data cluster inaccordance with one or more embodiments of the invention.

FIGS. 3A-3C show an example in accordance with one or more embodimentsof the invention.

FIG. 4 shows a diagram of a computing device in accordance with one ormore embodiments of the invention.

DETAILED DESCRIPTION

Specific embodiments will now be described with reference to theaccompanying figures. In the following description, numerous details areset forth as examples of the invention. It will be understood by thoseskilled in the art that one or more embodiments of the present inventionmay be practiced without these specific details and that numerousvariations or modifications may be possible without departing from thescope of the invention. Certain details known to those of ordinary skillin the art are omitted to avoid obscuring the description.

In the following description of the figures, any component describedwith regard to a figure, in various embodiments of the invention, may beequivalent to one or more like-named components described with regard toany other figure. For brevity, descriptions of these components will notbe repeated with regard to each figure. Thus, each and every embodimentof the components of each figure is incorporated by reference andassumed to be optionally present within every other figure having one ormore like-named components. Additionally, in accordance with variousembodiments of the invention, any description of the components of afigure is to be interpreted as an optional embodiment, which may beimplemented in addition to, in conjunction with, or in place of theembodiments described with regard to a corresponding like-namedcomponent in any other figure.

Throughout this application, elements of figures may be labeled as A toN. As used herein, the aforementioned labeling means that the elementmay include any number of items and does not require that the elementinclude the same number of elements as any other item labeled as A to N.For example, a data structure may include a first element labeled as Aand a second element labeled as N. This labeling convention means thatthe data structure may include any number of the elements. A second datastructure, also labeled as A to N, may also include any number ofelements. The number of elements of the first data structure and thenumber of elements of the second data structure may be the same ordifferent.

In general, embodiments of the invention relate to a method and systemfor storing data in a data cluster. Embodiments of the invention mayutilize a deduplicator, operating in an accelerator pool, which appliesan erasure coding procedure on data obtained from a host to divide thedata into data chunks and to generate parity chunks using the datachunks. The deduplicator may then perform deduplication on the datachunks to generate deduplicated data that includes deduplicated datachunks. The deduplicated data chunks and the parity chunks aresubsequently distributed to nodes in the data cluster in accordance withan erasure coding procedure.

In one or more embodiments of the invention, the deduplicator storesstorage information that specifies the nodes in which each data chunkand parity chunk is stored. In this manner, if the accelerator poolobtains data that include modifications to previously stored datachunks, the modified data chunks may be sent to the appropriate nodes(i.e., the nodes on which prior versions of the specific data chunk orparity chunk are stored). In this manner, embodiments of the inventionminimize the number of read and write operations that are required towrite erasure coded deduplicated data to the non-accelerator pool. Saidanother way, by tracking to which node each data chunk and parity chunkis written to in the non-accelerator pool, one or more embodiments ofthe invention enable only portions of a stripe (i.e., a set of datachunks and parity chunks) to be written to the non-accelerator pool whena portion of the stripe is modified. These results in fewer read andwrite operations being performed as none of the prior stored data chunksneed to be read from or re-written to the non-accelerator pool.

FIG. 1A shows an example system in accordance with one or moreembodiments of the invention. The system includes a host (100) and adata cluster (110). The host (100) is operably connected to the datacluster (110) via any combination of wired and/or wireless connections.

In one or more embodiments of the invention, the host (100) utilizes thedata cluster (110) to store data. The data stored may be backups ofdatabases, files, applications, and/or other types of data withoutdeparting from the invention.

In one or more embodiments of the invention, the host (100) isimplemented as a computing device (see e.g., FIG. 4). The computingdevice may be, for example, a laptop computer, a desktop computer, aserver, a distributed computing system, or a cloud resource (e.g., athird-party storage system accessible via a wired or wirelessconnection). The computing device may include one or more processors,memory (e.g., random access memory), and persistent storage (e.g., diskdrives, solid state drives, etc.). The computing device may includeinstructions, stored on the persistent storage, that when executed bythe processor(s) of the computing device cause the computing device toperform the functionality of the host (100) described throughout thisapplication.

In one or more embodiments of the invention, the host (100) isimplemented as a logical device. The logical device may utilize thecomputing resources of any number of computing devices and therebyprovide the functionality of the host (100) described throughout thisapplication.

In one or more embodiments of the invention, the data cluster (110)stores data and/or backups of data generated by the host (100). The dataand/or backups may be deduplicated versions of data obtained from thehost. The data cluster may, via an erasure coding procedure, storeportions of the deduplicated data across the nodes operating in the datacluster (110).

As used herein, deduplication refers to methods of storing only portionsof files (also referred to as file segments or segments) that are notalready stored in persistent storage. For example, when multipleversions of a large file, having only minimal differences between eachof the versions, are stored without deduplication, storing each versionwill require approximately the same amount of storage space of apersistent storage. In contrast, when the multiple versions of the largefile are stored with deduplication, only the first version of themultiple versions stored will require a substantial amount of storage.Once the first version is stored in the persistent storage, thesubsequent versions of the large file subsequently stored will bede-duplicated before being stored in the persistent storage resulting inmuch less storage space of the persistent storage being required tostore the subsequently stored versions when compared to the amount ofstorage space of the persistent storage required to store the firststored version.

Continuing with the discussion of FIG. 1A, the data cluster (110) mayinclude nodes that each store any number of portions of data. Theportions of data may be obtained by other nodes or obtained from thehost (100). For additional details regarding the data cluster (110),see, e.g., FIG. 1B.

FIG. 1B shows a diagram of a data cluster (120) in accordance with oneor more embodiments of the invention. The data cluster (120) may be anembodiment of the data cluster (110, FIG. 1A) discussed above. The datacluster (120) may include an accelerator pool (130) and anon-accelerator pool (150). The accelerator pool (130) may include adeduplicator(s) (132) and any number of data nodes (134, 136).Similarly, the non-accelerator pool (150) includes any number of datanodes (154, 156). The components of the data cluster (120) may beoperably connected via any combination of wired and/or wirelessconnections. Each of the aforementioned components is discussed below.

In one or more embodiments of the invention, the deduplicator(s) (132)is a device that includes functionality to perform deduplication on dataobtained from a host (e.g., 100, FIG. 1A). The deduplicator (132) maystore information useful to perform the aforementioned functionality.The information may include deduplication identifiers (D-IDs). A D-ID isa unique identifier that identifies portions of the data (also referredto as data chunks) that are stored in the data cluster (120). The D-IDmay be used to determine whether a data chunk of the obtained data isalready present elsewhere in the accelerator pool (140) or thenon-accelerator pool (150). The deduplicator (132) may use theinformation to perform the deduplication and generate deduplicated data(or a deduplicated backup). After deduplication, an erasure codingprocedure may be performed on the deduplicated data in order to generateparity chunks. The deduplicator (132) may perform the deduplication anderasure coding procedure via the method illustrated in FIG. 2

In one or more of embodiments of the invention, the deduplicator (132)is implemented as computer instructions, e.g., computer code, stored ona persistent storage that when executed by a processor of a data node(e.g., 134, 136) of the accelerator pool (140) cause the data node toprovide the aforementioned functionality of the deduplicator (132)described throughout this application and/or all, or a portion thereof,of the method illustrated in FIG. 2.

In one or more embodiments of the invention, the deduplicator (132) isimplemented as a computing device (see e.g., FIG. 4). The computingdevice may be, for example, a laptop computer, a desktop computer, aserver, a distributed computing system, or a cloud resource (e.g., athird-party storage system accessible via a wired or wirelessconnection). The computing device may include one or more processors,memory (e.g., random access memory), and persistent storage (e.g., diskdrives, solid state drives, etc.). The computing device may includeinstructions, stored on the persistent storage, that when executed bythe processor(s) of the computing device cause the computing device toperform the functionality of the deduplicator (132) described throughoutthis application and/or all, or a portion thereof, of the methodillustrated in FIG. 2.

In one or more embodiments of the invention, the deduplicator (132) isimplemented as a logical device. The logical device may utilize thecomputing resources of any number of computing devices and therebyprovide the functionality of the deduplicator (132) described throughoutthis application and/or all, or a portion thereof, of the methodillustrated in FIG. 2.

Continuing with the discussion of FIG. 1B, different data nodes in thecluster may include different quantities and/or types of computingresources, e.g., processors providing processing resources, memoryproviding memory resources, storages providing storage resources,communicators providing communications resources. Thus, the system mayinclude a heterogeneous population of nodes.

The heterogeneous population of nodes may be logically divided into anaccelerator pool (130) including nodes that have more computingresources, e.g., high performance nodes (134, 136) than other nodes anda non-accelerator pool (150) including nodes that have fewer computingresources, e.g., low performance nodes (154, 156) than the nodes in theaccelerator pool (130). For example, nodes of the accelerator pool (130)may include enterprise class solid state storage resources that providevery high storage bandwidth, low latency, and high input-outputs persecond (IOPS). In contrast, the nodes of the non-accelerator pool (150)may include hard disk drives that provide lower storage performance.While illustrated in FIG. 1B as being divided into two groups, the nodesmay be divided into any number of groupings based on the relativeperformance level of each node without departing from the invention.

In one or more embodiments of the invention, the data nodes (134, 136,154, 156) store data chunks and parity chunks. The data nodes (134, 136,154, 156) may include persistent storage that may be used to store thedata chunks and parity chunks. The generation of the data chunks andparity chunks is described below with respect to FIG. 2.

In one or more embodiments of the invention, the non-accelerator pool(150) includes any number of fault domains. In one or more embodimentsof the invention, a fault domain is a logical grouping of nodes (e.g.,data nodes) that, when one node of the logical grouping of nodes goesoffline and/or otherwise becomes inaccessible, the other nodes in thelogical grouping of nodes are directly affected. The effect of the nodegoing offline to the other nodes may include the other nodes also goingoffline and/or otherwise inaccessible. The non-accelerator pool (150)may include multiple fault domains. In this manner, the events of onefault domain in the non-accelerator pool (150) may have no effect toother fault domains in the non-accelerator pool (150).

For example, two data nodes may be in a first fault domain. If one ofthese data nodes in the first fault domain experiences an unexpectedshutdown, other nodes in the first fault domain may be affected. Incontrast, another data node in the second fault domain may not beaffected by the unexpected shutdown of a data node in the first faultdomain. In one or more embodiments of the invention, the unexpectedshutdown of one fault domain does not affect the nodes of other faultdomains. In this manner, data may be replicated and stored acrossmultiple fault domains to allow high availability of the data.

In one or more embodiments of the invention, each data node (134, 136,154, 156) is implemented as a computing device (see e.g., FIG. 4). Thecomputing device may be, for example, a laptop computer, a desktopcomputer, a server, a distributed computing system, or a cloud resource(e.g., a third-party storage system accessible via a wired or wirelessconnection). The computing device may include one or more processors,memory (e.g., random access memory), and persistent storage (e.g., diskdrives, solid state drives, etc.). The computing device may includeinstructions, stored on the persistent storage, that when executed bythe processor(s) of the computing device cause the computing device toperform the functionality of the data node (134, 136, 154, 156)described throughout this application and/or all, or a portion thereof,of the method illustrated in FIG. 2.

In one or more embodiments of the invention, the data nodes (134, 136,154, 156) are implemented as a logical device. The logical device mayutilize the computing resources of any number of computing devices andthereby provide the functionality of the data nodes (134, 136, 154, 156)described throughout this application and/or all, or a portion thereof,of the method illustrated in FIG. 2.

FIG. 2 shows a flowchart for storing data in a data cluster inaccordance with one or more embodiments of the invention. The methodshown in FIG. 2 may be performed by, for example, a deduplicator (132,FIG. 1B). Other components of the system illustrated in FIG. 1B mayperform the method of FIG. 2 without departing from the invention. Whilethe various steps in the flowchart are presented and describedsequentially, one of ordinary skill in the relevant art will appreciatethat some or all of the steps may be executed in different orders, maybe combined or omitted, and some or all steps may be executed inparallel.

In step 200, data is obtained from a host. The data may be a file, afile segment, a collection of files, or any other type of data withoutdeparting from the invention. data cluster. The data may be obtained inresponse to a request to store data and/or backup the data. Otherrequests may be used to initiate the method without departing from theinvention.

In step 202, confirmation is sent to the host. In one or moreembodiments of the invention, the confirmation is an acknowledgement(ACK) that confirms receipt of the data by the data cluster. At thisstage, from the perspective of the host, the data has been backed up.This is the case even though data cluster is still performing the methodshown in FIG. 2.

In step 204, an erasure coding procedure is performed on the data togenerate data chunks and parity chunks. In one or more embodiments ofthe invention, the erasure coding procedure includes dividing theobtained data into portions, referred to as data chunks. Each data chunkmay include any number of data segments associated with the obtaineddata. The individual data chunks may then be combined (or otherwisegrouped) into stripes (also referred to as Redundant Array ofIndependent Disks (RAID) stripes). One or more parity values are thencalculated for each of the aforementioned stripes. The number of paritystripes may vary based on the erasure coding algorithm that is beingused as part of the erasure coding procedure. Non-limiting examples oferasure coding algorithms are RAID-4, RAID-5, and RAID-6. Other erasingcoding algorithms may be used without departing from the invention.Continuing with the above discussion, if the erasing code procedure isimplementing RAID 4, then a single parity value is calculated. Theresulting parity value is then stored in a parity chunk. If erasurecoding procedure algorithm requires multiple parity values to becalculated, then the multiple parity values are calculated with eachparity value being stored in a separate data chunk.

As discussed above, the data chunks are used to generate parity chunksin accordance with the erasure coding procedure. More specifically, theparity chunks may be generated by applying a predetermined function(e.g., P Parity function, Q Parity Function), operation, or calculationto at least one of the data chunks. Depending on the erasure codingprocedure used, the parity chunks may include, but are not limited to, Pparity values and/or Q parity values.

In one embodiment of the invention, the P parity value is a Reed-Solomonsyndrome and, as such, the P Parity function may correspond to anyfunction that can generate a Reed-Solomon syndrome. In one embodiment ofthe invention, the P parity function is an XOR function.

In one embodiment of the invention, the Q parity value is a Reed-Solomonsyndrome and, as such, the Q Parity function may correspond to anyfunction that can generate a Reed-Solomon syndrome. In one embodiment ofthe invention, a Q parity value is a Reed-Solomon code. In oneembodiment of the invention, Q=g₀·D₀+g₁·D₁+g₂D₂+ . . . +g_(n−1)·D_(n−1),where Q corresponds to the Q parity, g is a generator of the field, andthe value of D corresponds to the data in the data chunks.

In one or more embodiments of the invention, the number of data chunksand parity chunks generated is determined by the erasure codingprocedure, which may be specified by the host, by the data cluster,and/or by another entity.

In step 206, deduplication is performed on the data chunks to obtaindeduplicated data chunks. In one or more embodiments of the invention,the deduplication is performed in the accelerator pool by identifyingthe data chunks of the obtained data and assigning a fingerprint to eachdata chunk. A fingerprint is a unique identifier (e.g., a D-ID) that maybe stored in metadata of the data chunk. The deduplicator performing thededuplication may generate a fingerprint for a data chunk and identifywhether the fingerprint matches an existing fingerprint stored in thededuplicator. If the fingerprint matches an existing fingerprint, thedata chunk may be deleted, as it is already stored in the data cluster.If the fingerprint does not match any existing fingerprints, the datachunk may be stored as a deduplicated data chunk. Additionally, thefingerprint is stored in the deduplicator for deduplication purposes offuture obtained data.

In one or more embodiments of the invention, the deduplicated datachunks collectively make up the deduplicated data. In one or moreembodiments of the invention, the deduplicated data chunks are the datachunks that were not deleted during deduplication.

In step 208, the deduplicated data chunks and parity chunks are storedacross data nodes in different fault domains in a non-accelerator pool.As discussed above, the deduplicated data chunks and the parity chunksare stored in a manner that minimizes reads and writes from thenon-accelerator pool. In one embodiment of the invention, thisminimization is achieved by storing data chunks and parity chunks, whichare collectively referred to as a stripe, in the same manner as a priorversion of the stripe. The deduplicator may use, as appropriate,location information for the previously stored data chunks and paritychunks to determine where to store the data chunks and parity chunks instep 208.

More specifically, in one embodiment of the invention, if thededuplicated data chunks and parity chunks are the first version of astripe (as opposed to a modification to an existing/previously storedstripe), then the deduplicated data chunks and parity chunks may bestored across the nodes (each in a different fault domain) in thenon-accelerator pool. The location (or in this case the specific node)in which the data chunk or parity chunk is stored is tracked by thededuplicator. The scenario does not require the deduplicator to uselocation information for previously stored data chunks and paritychunks.

However, if the deduplicated data chunks and parity chunks are thesecond version of a stripe (e.g., a modification to a previously storedstripe), then the deduplicated data chunks and parity chunks are storedacross the nodes (each in a different fault domain) in thenon-accelerator pool using prior stored location information. Thelocation (or in this case the specific node and/or fault domain) inwhich the data chunk or parity chunk is stored is tracked by thededuplicator.

For example, consider a scenario in which the first version of thestripe includes three data chunks (D1, D2, D3) and one parity chunk (P1)and that they were stored as follows: Node 1 stores D1, Node 2 storesD2, Node 3 stores D3, and Node 4 stores P1. Further, in this example, asecond version of the stripe is received that includes three data chunks(D1, D2′, D3) and one newly calculated parity chunk (P1′). Afterdeduplication only D2′ and P1′ need to be stored. Based on the priorstorage locations (also referred to as locations) of the data chunks(D1, D2, and D3) and parity chunks (P1) for the first version of thestripe, D2′ is stored on Node 2 and P1′ is stored on Node 4. By storingthe D2′ on Node 2 and P1′ on Node 4 the data chunks and parity chunksassociated with the second stripe satisfy the condition that all datachunks and parity chunks for the second version of the stripe are beingstored in separate fault domains. If the location information was nottaken into account, then the entire stripe (i.e., D1, D2′, D3, and P1′)would need to be stored in order to guarantee that the requirement thatall data chunks and parity chunks for the second version of the stripeare being stored in separate fault domains is satisfied.

In one or more embodiments of the invention, if the data node thatobtains the deduplicated data chunk, which is a modified version of aprior stored deduplicated data chunk, then the data node may: (i) storethe modified version of the deduplicated data chunk (i.e., the data nodewould include two versions of the data chunk) or (ii) store the modifiedversion of the deduplicated data chunk and delete the prior version ofthe deduplicated data chunk.

In one embodiment of the invention, the deduplicator includesfunctionality to determine whether a given data chunk is a modifiedversion of a previously stored data chunk. Said another way, after thedata is received from a host divided into data chunks and grouped intostripes, the deduplicator includes functionality to determine whether astripe is a modified version of a prior stored stripe. The deduplicatormay use the fingerprints of the data chunks within the stripe todetermine whether the stripe is a modified version of a prior storedstripe. Other methods for determining whether a data chunk is a modifiedversion of a prior stored data chunk and/or whether a stripe is amodified version of a prior stripe without departing from the invention.

In step 210, location information in the deduplicator is updated usingthe location of the deduplicated data chunks and parity chunks. Thelocation (or location) may be specified using a node identifier, a faultdomain identifier (i.e., the fault domain in which the node storing thedata chunk or parity chunk is located), or any other type of identifyinginformation. The location information may be stored along with otherchunk metadata, which may include, but is not limited to, a chunk type(e.g., data chunk or parity chunk), a deduplicated data chunk identifier(e.g., a D-ID) or parity chunk identifier (which may be generated for aparity chunk in the same manner as a D-ID for a data chunk), and theerasure coding information (e.g., information about the erasure codeprocedure, e.g., the erasure coding algorithm)

As discussed above, the data chunks and parity chunks may be stored indifferent fault domains. Storing the data chunks and parity chunks inmultiple fault domains may be for recovery purposes. In the event thatone or more fault domains storing data chunks or parity chunks becomeinaccessible, the data chunks and/or parity chunks stored in theremaining fault domains may be used to recreate the inaccessible data.In one embodiment of the invention, as part of (or in addition to) thechunk metadata, the deduplicator (or other computing device or logicaldevice) tracks the members of each stripe (i.e., which data chunks andwhich parity chunks are part of a stripe). This information may be usedto aid in any recover operation that is required to be performed on thedata stored in the data cluster.

In one embodiment of the invention, the data that is originally obtainedin step 200 and/or the deduplicated chunks obtained in step 206 may be:(i) stored on a node in the accelerator pool for a finite period of time(e.g., until it is determined that this data is no longer required inthe accelerator pool, where this determination may be made based on apolicy); (ii) stored on a node in the accelerator pool until the end ofthe step 208 and then deleted from the accelerator pool.

EXAMPLE

The following section describes an example. The example is not intendedto limit the invention. The example is illustrated in FIGS. 3A-3C.Turning to the example, consider a scenario in which a data clusterobtains two backups from a single host at two points in time. The hostmay request the backups be stored in the data cluster in a 3:1 erasurecoding scheme. FIG. 3A shows a diagram of the two backups at the twopoints in time. Backup A (300) may be obtained at a point in time T=1.Backup A (300) includes data that may be divided into data chunks A0(302), A1 (304), and A2 (306). At a second point in time T=2, the datacluster obtains a second backup (310) that includes data that may bedivided into data chunks A0 (312), A1′ (314), and A3 (316).

In this example, Backup B is a modified version of Backup A.Accordingly, assume that the data associated with data chunk A0 (312) ofbackup B (310) is identical to the data associated with data chunk A0(302) of backup A (300). Similarly, the data associated with data chunkA2 (316) of backup B (310) is identical to the data associated with datachunk A2 (306) of backup A (300). In contrast, the data associated withdata chunk A1′ (314) of backup B (310) is an update of data chunk A1(304) of backup A (300). Finally, in this example, assume that theerasure coding process includes implementing RAID 4.

FIG. 3B shows the data cluster after backup A (300) is processed inaccordance with FIG. 2. The data cluster may include an accelerator pool(320) that performs the method of FIG. 2 to generate deduplicated backupA (322) using backup A (300). The method may include dividing the backupinto data chunks A0, A1, and A2, where these data chunks are associatedwith a first stripe. The aforementioned data chunks are then used togenerate a parity chunk AP1 using RAID 3.

Because the deduplicated backup A (322) is the first backup stored inthe data cluster, all three data chunks are distributed across nodes inthe non-accelerator pool (330) as deduplicated data chunks (322A, 322B,322C). Deduplicated data chunk A0 (322A) may be stored in a node A(332), deduplicated data chunk A1 (322B) may be stored in a node B(334), deduplicated data chunk A2 (322C) may be stored in a node C(336), and parity chunk AP1 (322D) may be stored in a node D (338). Eachnode (332, 334, 336, 338) may be a node in a unique fault domain. Inthis manner, each chunk (322A, 322B, 322C, 322D) is stored in adifferent fault domain.

The location of each deduplicated data chunk (322A, 322B, 322C) andparity chunk (322D) is stored in the deduplicator of the acceleratorpool (320) as location information. The location information may includeentries that each specify a deduplicated data chunk (322A, 322B, 322C)or the parity chunk AP1 (322D) and the data node (332, 334, 336, 338)storing the respective chunk.

At the second point in time T=2, backup B (310) is obtained by theaccelerator pool (320). The backup B (310) may be divided into datachunks A0, A1′, and A2, where these data chunks are associated with asecond stripe that is a modified version of the first stripe. The datachunks (A0, A1′, A2) may be used to generate a parity chunk AP1′. Thedata chunks in the second stripe are then deduplicated by thededuplicator. The result of the deduplication of the second stripe isthat data chunks A0 and A2 exist in the non-accelerator pool and thusare deleted from the backup B.

The remaining chunks associated with the deduplicated backup B (324) maybe stored in nodes of the non-accelerator pool (330) as deduplicateddata chunks A1′ (324A) and AP1′ (324B). The accelerator pool (320) mayuse the location information, which specifies the location informationof deduplicated data chunks (322A, 322B, 322C) and parity chunk (322D)of deduplicated backup A (322), to determine where to store thededuplicated data chunk (324A) and parity chunk (324B) of deduplicatedbackup B (324).

Using the location information, deduplicated data chunk A1′ (324A) isstored in node B (334), where deduplicated data chunk A1 (322B) isstored. Subsequently, deduplicated data chunk A1 (322B) may be deletedfrom node B (334). Similarly, parity chunk AP1′ (324B) is stored in nodeD (338). Further, parity chunk AP1 (322D) may be deleted from node D(338).

End of Example

As discussed above, embodiments of the invention may be implementedusing computing devices. FIG. 4 shows a diagram of a computing device inaccordance with one or more embodiments of the invention. The computingdevice (400) may include one or more computer processors (402),non-persistent storage (404) (e.g., volatile memory, such as randomaccess memory (RAM), cache memory), persistent storage (406) (e.g., ahard disk, an optical drive such as a compact disk (CD) drive or digitalversatile disk (DVD) drive, a flash memory, etc.), a communicationinterface (412) (e.g., Bluetooth interface, infrared interface, networkinterface, optical interface, etc.), input devices (410), output devices(408), and numerous other elements (not shown) and functionalities. Eachof these components is described below.

In one embodiment of the invention, the computer processor(s) (402) maybe an integrated circuit for processing instructions. For example, thecomputer processor(s) may be one or more cores or micro-cores of aprocessor. The computing device (400) may also include one or more inputdevices (410), such as a touchscreen, keyboard, mouse, microphone,touchpad, electronic pen, or any other type of input device. Further,the communication interface (412) may include an integrated circuit forconnecting the computing device (400) to a network (not shown) (e.g., alocal area network (LAN), a wide area network (WAN) such as theInternet, mobile network, or any other type of network) and/or toanother device, such as another computing device.

In one embodiment of the invention, the computing device (400) mayinclude one or more output devices (408), such as a screen (e.g., aliquid crystal display (LCD), a plasma display, touchscreen, cathode raytube (CRT) monitor, projector, or other display device), a printer,external storage, or any other output device. One or more of the outputdevices may be the same or different from the input device(s). The inputand output device(s) may be locally or remotely connected to thecomputer processor(s) (402), non-persistent storage (404), andpersistent storage (406). Many different types of computing devicesexist, and the aforementioned input and output device(s) may take otherforms.

One or more embodiments of the invention may be implemented usinginstructions executed by one or more processors of the data managementdevice. Further, such instructions may correspond to computer readableinstructions that are stored on one or more non-transitory computerreadable mediums.

One or more embodiments of the invention may improve the operation ofone or more computing devices. More specifically, embodiments of theinvention improve the efficiency of performing storage operations in adata cluster. The efficiency is improved by implementing erasure codingprocedures and performing deduplication on data. The erasure codingprocedure includes generating additional portions of data associatedwith the data. The deduplicated data and the additional portions of datamay be stored across multiple fault domains. In this manner, if anynumber of fault domains become inaccessible prior to recovery of data,the data stored in the remaining fault domains may be used to recreatethe data. This method may replace the need to store multiple copies ofthe same data across the fault domains, thus reducing the amount ofstorage used for storing data while maintaining policies in the event offault domain failures.

Further, embodiments of the invention improve the storage and recoveryoperations by tracking the location of each portion of data (e.g., datachunks and parity chunks) stored in the data cluster. By monitoringtracking the location, embodiments of the invention may be used to senddeduplicated data chunks and/or parity chunks to appropriate data nodes.

Thus, embodiments of the invention may address the problem ofinefficient use of computing resources. This problem arises due to thetechnological nature of the environment in which data storage operationsare performed.

The problems discussed above should be understood as being examples ofproblems solved by embodiments of the invention disclosed herein and theinvention should not be limited to solving the same/similar problems.The disclosed invention is broadly applicable to address a range ofproblems beyond those discussed herein.

While the invention has been described above with respect to a limitednumber of embodiments, those skilled in the art, having the benefit ofthis disclosure, will appreciate that other embodiments can be devisedwhich do not depart from the scope of the invention as disclosed herein.Accordingly, the scope of the invention should be limited only by theattached claims.

What is claimed is:
 1. A method for storing data, the method comprising:obtaining data; applying an erasure coding procedure to the data toobtain a plurality of data chunks and a parity chunk; deduplicating theplurality of data chunks to obtain a plurality of deduplicated datachunks; storing, across a plurality of nodes, the plurality ofdeduplicated data chunks and the parity chunk; and tracking locationinformation for each of the plurality of deduplicated data chunks andthe parity chunk.
 2. The method of claim 1, further comprising:obtaining second data; applying the erasure coding procedure to thesecond data to obtain a second plurality of data chunks and a secondparity chunk; deduplicating the second plurality of data chunks toobtain a second plurality of deduplicated data chunks; storing, acrossthe plurality of nodes and using the location information for at leastone of the plurality of deduplicated data chunks, the second pluralityof deduplicated data chunks and the second parity chunk.
 3. The methodof claim 2, wherein a first deduplicated data chunk of the firstplurality of deduplicated data chunks is stored in a node of theplurality of nodes, wherein a second deduplicated data chunk of thesecond plurality of deduplicated data chunks is a modified version ofthe first deduplicated data chunk, and wherein storing, across theplurality of nodes and using the location information for at least oneof the plurality of deduplicated data chunks, the second plurality ofdeduplicated data chunks and the second parity chunk comprises storingthe second deduplicated data chunk on the node of the plurality ofnodes.
 4. The method of claim 3, wherein the plurality of data chunksand the parity chunk are associated with a first stripe; wherein thesecond plurality of data chunks and the second parity chunk isassociated with a second stripe, wherein the second stripe is a modifiedversion of the first stripe, wherein storing, across the plurality ofnodes and using the location information for at least one of theplurality of deduplicated data chunks, the second plurality ofdeduplicated data chunks and the second parity chunk further comprisesstoring the parity chunk and the second parity chunk on a second node ofthe plurality of nodes.
 5. The method of claim 1, wherein the erasurecoding procedure is applied by a deduplicator executing on a node in anaccelerator pool, wherein the plurality of nodes is located is anon-accelerator pool, and wherein a data cluster comprises theaccelerator pool and the non-accelerator pool.
 6. The method of claim 1,wherein applying the erasure coding procedure comprises: dividing thedata into data chunks; selecting, from the data chunks, the plurality ofdata chunks; and generating the parity chunk using the plurality of datachunks.
 7. The method of claim 1, wherein the parity chunk comprises a Pparity value.
 8. The method of claim 1, wherein each of the plurality ofnodes is in a separate fault domain.
 9. The method of claim 1, whereindeduplicating the plurality of data chunks to obtain the plurality ofdeduplicated data chunks is performed after a parity value for theplurality of data chunks is calculated.
 10. A non-transitory computerreadable medium comprising computer readable program code, which whenexecuted by a computer processor enables the computer processor toperform a method for storing data, the method comprising: obtainingdata; applying an erasure coding procedure to the data to obtain aplurality of data chunks and a parity chunk; deduplicating the pluralityof data chunks to obtain a plurality of deduplicated data chunks;storing, across a plurality of nodes, the plurality of deduplicated datachunks and the parity chunk; and tracking location information for eachof the plurality of deduplicated data chunks and the parity chunk. 11.The non-transitory computer readable medium of claim 10, the methodfurther comprising: obtaining second data; applying the erasure codingprocedure to the data to obtain a second plurality of data chunks and asecond parity chunk; deduplicating the second plurality of data chunksto obtain a second plurality of deduplicated data chunks; storing,across the plurality of nodes and using the location information for atleast one of the plurality of deduplicated data chunks, the secondplurality of deduplicated data chunks and the second parity chunk. 12.The non-transitory computer readable medium of claim 11, wherein a firstdeduplicated data chunk of the first plurality of deduplicated datachunks is stored in a node of the plurality of nodes, wherein a seconddeduplicated data chunk of the second plurality of deduplicated datachunks is a modified version of the first deduplicated data chunk, andwherein storing, across the plurality of nodes and using the locationinformation for at least one of the plurality of deduplicated datachunks, the second plurality of deduplicated data chunks and the secondparity chunk comprises storing the second deduplicated data chunk on thenode of the plurality of nodes.
 13. The non-transitory computer readablemedium of claim 12, wherein the plurality of data chunks and the paritychunk are associated with a first stripe; wherein the second pluralityof data chunks and the second parity chunk is associated with a secondstripe, wherein the second stripe is a modified version of the firststripe, wherein storing, across the plurality of nodes and using thelocation information for at least one of the plurality of deduplicateddata chunks, the second plurality of deduplicated data chunks and thesecond parity chunk further comprises storing the parity chunk and thesecond parity chunk on a second node of the plurality of nodes.
 14. Thenon-transitory computer readable medium of claim 10, wherein the erasurecoding procedure is applied by a deduplicator executing on a node in anaccelerator pool, wherein the plurality of nodes is located is anon-accelerator pool, and wherein a data cluster comprises theaccelerator pool and the non-accelerator pool.
 15. The non-transitorycomputer readable medium of claim 10, wherein applying the erasurecoding procedure comprises: dividing the data into data chunks;selecting, from the data chunks, the plurality of data chunks; andgenerating the parity chunk using the plurality of data chunks.
 16. Thenon-transitory computer readable medium of claim 10, wherein the paritychunk comprises a P parity value.
 17. The non-transitory computerreadable medium of claim 10, wherein each of the plurality of nodes isin a separate fault domain.
 18. The non-transitory computer readablemedium of claim 10, wherein deduplicating the plurality of data chunksto obtain the plurality of deduplicated data chunks is performed after aparity value for the plurality of data chunks is calculated.
 19. A datacluster, comprising: a plurality of data nodes comprising an acceleratorpool and a non-accelerator pool, wherein the accelerator pool comprisesa data node, and the non-accelerator pool comprises a plurality of datanodes; wherein the data node of the plurality node is programmed to:obtain data; apply an erasure coding procedure to the data to obtain aplurality of data chunks and a parity chunk; deduplicate the pluralityof data chunks to obtain a plurality of deduplicated data chunks; store,across a plurality of nodes, the plurality of deduplicated data chunksand the parity chunk; and track location information for each of theplurality of deduplicated data chunks and the parity chunk.
 20. The datacluster of claim 19, wherein the node is further programmed to: obtainsecond data; apply the erasure coding procedure to the second data toobtain a second plurality of data chunks and a second parity chunk;deduplicate the second plurality of data chunks to obtain a secondplurality of deduplicated data chunks; and storing, across the pluralityof nodes and using the location information for at least one of theplurality of deduplicated data chunks, the second plurality ofdeduplicated data chunks and the second parity chunk.